eval: add typed training config to Hydra and feedback requests#38
eval: add typed training config to Hydra and feedback requests#38
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughA training configuration system is introduced across the evaluation module, adding typed EvalTrainingConfig and HarnessConfig dataclasses, implementing validation logic, and propagating training settings through feedback payloads during execution. Changes
Sequence DiagramsequenceDiagram
participant Config as Config File (YAML)
participant Loader as Config Loader
participant Validator as build_harness_config()
participant Runner as runner.py
participant Feedback as FeedbackItem
Config->>Loader: Load EvalConfig with training section
Loader->>Validator: Pass EvalConfig (with EvalTrainingConfig)
Validator->>Validator: Type check: training is EvalTrainingConfig?
Validator->>Validator: Construct TrainingConfig from fields
Validator->>Runner: Return HarnessConfig (with TrainingConfig)
Runner->>Runner: Type check: config.training is TrainingConfig?
Runner->>Runner: Extract training_cfg variable
Runner->>Runner: Serialize training config to metadata.json
Runner->>Feedback: Create FeedbackItem with training_cfg
Feedback->>Feedback: Include training config in payload
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
claas/eval/config.py (1)
30-37: Consider collapsing manual training field mapping to reduce schema drift.Field-by-field copy is easy to miss when
EvalTrainingConfig/TrainingConfigevolves.♻️ Suggested simplification
- training = TrainingConfig( - learning_rate=eval_cfg.training.learning_rate, - alpha=eval_cfg.training.alpha, - is_clip=eval_cfg.training.is_clip, - max_grad_norm=eval_cfg.training.max_grad_norm, - kl_reg_weight=eval_cfg.training.kl_reg_weight, - teacher_top_k=eval_cfg.training.teacher_top_k, - ) + training = TrainingConfig(**dataclasses.asdict(eval_cfg.training))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claas/eval/config.py` around lines 30 - 37, Replace the manual field-by-field construction of TrainingConfig by unpacking the training sub-config directly to avoid schema drift: construct TrainingConfig by passing the EvalTrainingConfig instance's mapping (e.g., using eval_cfg.training.dict()/eval_cfg.training.model_dump() or dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and let Python unpack those keys into TrainingConfig(**...). Keep the symbols TrainingConfig and eval_cfg.training and ensure you use the appropriate method for the config type (pydantic/dataclass) and include exclude_unset/exclude_none if needed to preserve intended defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@claas/eval/config.py`:
- Around line 30-37: Replace the manual field-by-field construction of
TrainingConfig by unpacking the training sub-config directly to avoid schema
drift: construct TrainingConfig by passing the EvalTrainingConfig instance's
mapping (e.g., using eval_cfg.training.dict()/eval_cfg.training.model_dump() or
dataclasses.asdict(eval_cfg.training), or a TrainingConfig.from_... helper) and
let Python unpack those keys into TrainingConfig(**...). Keep the symbols
TrainingConfig and eval_cfg.training and ensure you use the appropriate method
for the config type (pydantic/dataclass) and include exclude_unset/exclude_none
if needed to preserve intended defaults.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (7)
claas/eval/README.mdclaas/eval/config.pyclaas/eval/configs/base.yamlclaas/eval/runner.pyclaas/eval/types.pytests/test_eval_config.pytests/test_eval_runner.py
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5aa29cf5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
claas/eval/runner.py
Outdated
| training_cfg = config.training | ||
| if not isinstance(training_cfg, TrainingConfig): | ||
| raise TypeError(f"EvalConfig.training must be TrainingConfig, got {type(training_cfg)!r}") |
There was a problem hiding this comment.
Validate training overrides before starting eval loop
This new runtime check only enforces that config.training is a TrainingConfig, not that its values are valid, so invalid Hydra overrides (for example training.is_clip=0) now make every /v1/feedback call fail with 422 while the loop continues under the existing except httpx.HTTPError path and still records step outputs. In that scenario the run appears to progress but applies no distillation updates, which can silently invalidate experiment conclusions; the eval side should fail fast on invalid training values before entering the step loop.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_api.py (1)
275-327: Consider parameterizing boundary tests across all validated training fields.Line 275–Line 327 currently exercises
is_clipand mismatch checks; a small parametrized matrix forlearning_rate,alpha,max_grad_norm,kl_reg_weight, andteacher_top_kwould better protect_validate_training_configfrom regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_api.py` around lines 275 - 327, Extend the existing tests that call _validate_training_config by parameterizing the invalid-value and mismatch checks currently in test_feedback_rejects_invalid_training_config and test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize over the fields ["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"] and for each field send a request with an out-of-bound/invalid value (e.g., wrong type or negative where not allowed) and assert 422 with the field name in the error, and add a second parametrized test that submits two requests with differing valid values for the same field and assert 400 with detail "all requests must use the same training config"; keep using _mock_config(monkeypatch) and TestClient(web_app) and reference the existing test function names to replace/augment so the new parameterized tests exercise _validate_training_config for every listed training field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_api.py`:
- Around line 275-327: Extend the existing tests that call
_validate_training_config by parameterizing the invalid-value and mismatch
checks currently in test_feedback_rejects_invalid_training_config and
test_feedback_rejects_mismatched_training_config: add pytest.mark.parametrize
over the fields
["is_clip","learning_rate","alpha","max_grad_norm","kl_reg_weight","teacher_top_k"]
and for each field send a request with an out-of-bound/invalid value (e.g.,
wrong type or negative where not allowed) and assert 422 with the field name in
the error, and add a second parametrized test that submits two requests with
differing valid values for the same field and assert 400 with detail "all
requests must use the same training config"; keep using
_mock_config(monkeypatch) and TestClient(web_app) and reference the existing
test function names to replace/augment so the new parameterized tests exercise
_validate_training_config for every listed training field.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (11)
claas/api.pyclaas/core/types.pyclaas/eval/README.mdclaas/eval/__main__.pyclaas/eval/config.pyclaas/eval/configs/base.yamlclaas/eval/runner.pyclaas/eval/types.pytests/test_api.pytests/test_eval_config.pytests/test_eval_runner.py
🚧 Files skipped from review as they are similar to previous changes (3)
- claas/eval/configs/base.yaml
- claas/eval/README.md
- tests/test_eval_runner.py
Summary
Validation
Summary by CodeRabbit
New Features
Configuration
Tests